Skip to main content Skip to complementary content

Qlik Replicate Pivotal Greenplum endpoint architecture overview

The following shows the Qlik Replicate Pivotal Greenplum endpoint system architecture for:

Full load

Full load is used to setup or refresh a data warehouse on a target by concurrently loading large amounts of data from source tables. High-speed data extraction is initiated from endpoints like Oracle or Microsoft SQL Server, then gpfdist and buffered load files are used for high-speed data loading into Pivotal Greenplum. The following shows the Pivotal Greenplum database architecture for full load.

The Pivotal Greenplum database architecture for full load. In this model, data goes through the Replication Server, passing through a Transform Filter and into Greenplum Integration. In a Bulk Loader, external tables are staged in similarly-sized CSV files for fast parallel load into the GP cluster. The bulk insert CSV files are moved into gpfdist, then moved out of the Replication Server into the target database.

CDC

For incremental load, Qlik Replicate uses log-based change data capture (CDC). During CDC replication, Qlik Replicate creates external Web tables or external tables to load SQL statements into the target Pivotal Greenplum database. The statements are then applied to the target tables. The following shows the Pivotal Greenplum database architecture for CDC.

The Pivotal Greenplum database architecture for CDC. Data going through in-memory stream processing passes through either transactional CDC, in which transactions are applied in real-time, in order, or batch-optimized CDC, in which change records are consolidated to minimize the transactions applied to the target. In either case, the data is then passed to gpfdist, then to the target database.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!